Compression of DNA Sequences
نویسندگان
چکیده
We propose a lossless algorithm to compress the information contained in DNA sequences. None of the available universal algorithms compress such data. This is due to the speciicity of genetic information. Our method is based on regularities, such as the presence of palindromes, in the DNA. The results we obtain, although not satisfactory, are far beyond classical algorithms.
منابع مشابه
Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(bits/base) of Genomes
We present a Compression Tool , GenBit Compress”, for genetic sequences based on our new proposed “GenBit Compress Algorithm”. Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better compression results show that GenBit compress algorithm is the best among the remaining Genome compression algorithms for non-repetitive DNA sequences in Genomes. The ...
متن کاملA Biological sequence Compression based on Look up Table (LUT) using Complementary Palindrome of Fixed size
Data Storage costs have an appreciable proportion of total cost in the creation and analysis of DNA sequences. In particular, the increase in the DNA sequences is highly remarkable with compare to increase in the disk storage capacity. General text compression algorithms do not utilize the specific characteristics of DNA sequences. In this paper we have proposed a compression algorithm based on...
متن کاملDna Compression Using Hash Based Data Structure
DNA Sequences making up any organism comprise the basic blueprint of that organism so that understanding and analyzing different genes within sequences has become an extremely important task. Biologists are producing huge volumes of DNA sequences every day that makes genome sequence database growing exponentially. The databases such as EMBL, GenBank represent millions of DNA sequences filling m...
متن کاملBiological sequence compression algorithms.
Today, more and more DNA sequences are becoming available. The information about DNA sequences are stored in molecular biology databases. The size and importance of these databases will be bigger and bigger in the future, therefore this information must be stored or communicated efficiently. Furthermore, sequence compression can be used to define similarities between biological sequences. The s...
متن کاملGrammar-based Compression of DNA Sequences
Grammar-based compression algorithms infer context-free grammars to represent the input data. The grammar is then transformed into a symbol stream and finally encoded in binary. We explore the utility of grammar-based compression of DNA sequences. We strive to optimize the three stages of grammar-based compression to work optimally for DNA. DNA is notoriously hard to compress, and ultimately, o...
متن کاملDNABIT Compress – Genome compression algorithm
Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits...
متن کامل